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Mechanistic dynamic models of biochemical networks such as Ordinary Differential Equations 
(ODEs) contain unknown parameters like the reaction rate constants and the initial concentrations 
of the compounds. The large number of parameters as well as their nonlinear impact on the model 
responses hamper the determination of confidence regions for parameter estimates. At the same 
time, classical approaches translating the uncertainty of the parameters into confidence intervals for 
model predictions are hardly feasible. 

In this article it is shown that a so-called prediction profile likelihood yields reliable confidence 
intervals for model predictions, despite arbitrarily complex and high-dimensional shapes of the 
confidence regions for the estimated parameters. Prediction confidence intervals of the dynamic 
states allow a data-based observability analysis. The approach renders the issue of sampling a high- 
dimensional parameter space into evaluating one-dimensional prediction spaces. The method is also 
applicable if there are non-identifiable parameters yielding to some insufficiently specified model 
predictions that can be interpreted as non-observability. Moreover, a validation profile likelihood is 
introduced that should be applied when noisy validation experiments are to be interpreted. 

The properties and appficabifity of the prediction and validation profile likelihood approaches 
are demonstrated by two examples, a small and instructive ODE model describing two consecutive 
reactions, and a realistic ODE model for the MAP kinase signal transduction pathway. The pre- 
sented general approach constitutes a concept for observability analysis and for generating reliable 
confidence intervals of model predictions, not only, but especially suitable for mathematical models 
of biological systems. 



I. INTRODUCTION 



The major steps of the process of mathematical mod- 
eling comprise model discrimination, i.e. identification 
of an appropriate model structure, model calibration, 
i.e. estimation of unknown model parameters, as well as 
prediction and model validation. For all these topics it is 
essential to have appropriate methods assessing the cer- 
tainty or ambiguity of any result for given experimental 
information. 

For parameter estimation, there are several approaches 
to derive confidence intervals, like standard intervals 
which are based on an estimate of the cova riance ma- 
trix of the parameter esti mates ISachsl984l. bootstrap 
based coii fidence intervals Davisonl997l lDiCicciol987l 
|Joshi2006l. as we l l as l ikelihood based confidence inter- 
vals Venzonl988l lRaue 2009] . For model discrimination, 
significance statements can be obtained by statistical 
tests. However, for model predictions, there are still de- 
mands for methodology that is applicable for mathemat- 
ical models like ODEs used to describe the dynamics of a 
syste m in a variety o f scientific fields e.g. in molecular bi- 
ology Hlavacek2009l ISwameve2Q03 1 . but also in medical 
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research, chemistry, engineering, and physics. 

The mere estimation of parameters is often not the 
final aim of an investigation. More frequently, it is 
desired to utilize the parametrized model to generate 
model predictions such as the dynamic behavior of un- 
observed components. Classically, the uncertainty in the 
model parameters is attempted to be translated into cor- 
responding prediction confidence intervals. For models 
that depend linearly on the model parameters, as it oc- 
curs in classical regression models, this is well studied and 
known as propagation of uncertainty based on standard 
errors. This approach is appropriate and sufficient for 
many applications. However, e.g. for biochemical net- 
works, the model responses depend nonlinear ly on the 
model parameters. Here, the boundaries of the param- 
eter confidence region can exhibit arbitrarily complex 
shape and are usually difficult to translate into bound- 
aries for the prediction confidence intervals. Therefore, 
established approaches aim to scan the entire parameter 
subspace which is in a sufficient agreement with the ex- 
perimental data to propagate the parameters confidence 
regions into confidence intervals for the model predic- 
tions. The major challenge is the complex nonlinear 
interrelation between parameters and model responses 
which requires that the parameter space has to be densely 
sampled to capture all scenarios of model predictions. 
For models with tens to hundreds of parameters this is 
numerically demanding or even infeasible because high 



dimensional spaces cannot be densely sampled. This is 
issue oft en referred to the curse of dimensionality in lit- 
erature [Marimontl979l . IScottl99ll |. 

There are several methods for an approximate 
sampling of the parameter space, e.g. the Markov 



Chain M onte Carlo (MCMC) methods lGelman2003 . 
Kassl998l, and bootstrap based approaches |Joshi 2006 , 
Molinaro2005j . However, for the ODE models used to 
describe interaction networks, these methods are numer- 
ically very demanding and provide only very rough ap- 
proximations. Therefore, it is difficult to control the cov- 
erage of the prediction confidence intervals for these ap- 
proaches. Moreover non-identifiable parameters are not 
explicitly considered hampering the convergence of these 
sampling techniques and yielding results that are ques- 
tionable and difficult to interpret [Bavarri2004j . 

Conceptually related to the p rediction p r ofile likeli- 
hood approach presented here, |Raue2009l lRaue2010l | 
presented an approach for the determination of confi- 
dence intervals for the model parameters by sampling 
the parameter profile likelihood. For their approach the 
problems concerning identifiability are resolved and the 
computation is feasible also for high-dimensional param- 
eter spaces. Nevertheless, the direct translation to confi- 
dence intervals of the model trajectories only works as an 
approximation yielding a coverage rate that is sometimes 
lower than desired. 

The idea of the prediction profile likelihood presented 
here is to determine prediction confidence without an 
explicit sampling strategy for the parameter space. In- 
stead, a certain fixed value for a prediction is used as a 
nonlinear constraint and the parameter values are cho- 
sen via constraint optimization of the likelihood. This 
does neither require a unique solution in terms of iden- 
tifiability nor confidence intervals for the parameter es- 
timates. The constraint maximum likelihood approach 
checks the agreement of a predicted value with the ex- 
perimental data. By repeating this procedure for con- 
tinuous variations of the predicted value, a prediction 
profile-likelihood is obtained. Thresholding the predic- 
tion profile likelihood yields statistically accurate confi- 
dence intervals. The desired level of confidence which co- 
incides with the level of agreement with the experiments 
is controlled by the threshold. 

The theoretical background of the prediction profile 
likelihood, also called predic tive likelihood has been al- 
ready studied |Hinklevl979l |. Moreover, related ideas 
are already applie d in the co ntext of generalized lin- 
ear mixed models Boothl998j . unobserved data points 
p3utlerl98(|. The linear approxi mation has b een applied 
in nonlinear regression analyses jCoolevl989l |. A review 
of prediction profile likelihood approaches and a modi- 
fication t o sufficiency-ba sed predictive likelihood is pro- 
vided in JBiornstadl990l |. 

In this paper, this concept is applied to ODE models 
occurring in mechanistic dynamic models, e.g. in molec- 
ular and systems biology. In this context the approach 
allows a data-based observability analysis. Moreover, it 



is extended to obtain confidence intervals for validation 
experiments. 



II. RESULTS 

A. Small illustration model 

First, a small but illustrative model of two consecutive 
reactions 



A '^ B % C 



(1) 



with rates 9i — 0.05,^2 = 0.1 and initial conditions 
yl(0) = 03 = 1,5(0) = 0,C(0) = is utilized to illus- 
trate our approach. For this purpose, it is assumed that 
C{t) is measured at i = 0, 10, . . . , 100. 

For the simulated measurements, Gaussian noise e ~ 
A''(0, cr^) with a = 0.1 has been assumed which corre- 
sponds to a typical signal-to-noise ratio for applications 
in cell biology of around 10%. If an experimental setup 
would not allow for negative measurements, a log-normal 
distribution of the observational noise could be more ap- 
propriate. Then, the Gaussian setting is obtained after a 
log-transformation of the data [Kreutz2007j . Such trans- 
formations and preprocessing procedures would have to 
be performed before the analysis starts. 

Panel (a) in Fig. [T] shows the dynamics of A{t), B{t), 
and C{t) as well as a typical noise realization. Such 
simulated data realizations are utilized to calculate the 
prediction- and validation profile likelihood for the dy- 
namic states. 

Panel (b) shows, as an example, the prediction pro- 
file likelihood and the validation profile likelihood for the 
same noise realization for predicting A(t) at time point 
t = 10. The validation profile likelihood has been calcu- 
lated for validation data with 10% measurement noise, as 
it was assumed for the measurements The vertical axis is 
minus two times the log-likelihood which corresponds to 
the residual sum of squares. For illustration purposes, the 
minimum of the log-likelihood LL* is shifted to zero in 
all figures. Three thresholds corresponding to 68%, 90%, 
and 99% confidence levels are plotted as horizontal lines. 
As explained in the section 'Materials and Methods', the 
projections to the horizontal axis yields the respective 
confidence intervals for a prediction or for a validation 
experiment. The constraint optimization procedure is 
infeasible for A{t) < and therefore the PCIs automati- 
cally account for strictly positive values of A. 

The calculation of the prediction and validation con- 
fidence intervals has been repeated for t — 0,10, ... , 100 
and all three dynamic states A{t), B{t), C{t). In pan- 
els (c)-(e), the respective prediction confidence intervals 
(PCIs) are plotted as well as the prediction profile like- 
lihood. The corresponding validation profile likelihood 
functions and the respective validation confidence inter- 
vals are shown in the supplemental information (SI). For 
plotting the confidence intervals along the time axis, the 
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FIG. 1: The three figures in panel (a) show the dynamics and measurement realization for the small model used for illustration 
purpose. C(t) is measured and the dynamics of all states, i.e. A(t), B(t), and C(t), is intended to be predicted. Panel (b) 
shows as an example the prediction profile likelihood (gray dashed curve) and validation profile likelihood (black dashed curve) 
of A(t=10). Thresholding yields confidence intervals for prediction (gray vertical lines) and validation (black vertical lines). 
The three thresholds and the respective projections correspond to a— 68%, 90%, and 99% confidence intervals. The VCIs are 
larger than the PCIs, because they account for the measurement error of a validation data point. Panels (c)-(e) show prediction 
confidence intervals (gray) for the unobserved states A(t), B(t), as well as for the measured state C(t). The prediction profile 
likelihood functions are plotted as black curves in vertical direction. Non-observability is illustrated in panels (f)-(h). Panel 
(f) shows a realization of the measurements for a design which does not provide sufficient information about the steady state 
of C. This leads to a flat prediction proflle likelihood for large values for A(t) as shown in panel (g), as well as for B(t) for 
t>0 as plotted in panel (h). A flat prediction proflle likelihood in turn yields unbounded prediction and validation confldence 
intervals and non-observability of A(t) and B(t) as indicated by the gray shaded regions. 



PCIs evaluated the eleven time points have been inter- 
connected by cubic piecewise interpolation. The dis- 
played confidence intervals constitute the propagation of 
information from the measurements of C{t) to predic- 
tions of the dynamics of the compound concentrations. 
Because C is the measured compound in our example, 
the prediction confidence intervals for C are much smaller 
than for A and B. However, also A and B yield bounded 
prediction confidence intervals which can be interpreted 
as observability of these dynamic states. 

In the Appendix, the reliability of our confidence in- 
tervals is investigated by calculating the coverage, i.e. by 
comparing the confidence level with the frequency that 
the true value is inside the prediction confidence inter- 



val. For our example, it will be demonstrated that confi- 
dence intervals using the asymptotic threshold sometimes 
yield slightly conservative intervals. Also an algorithm to 
improve the threshold is provided which yields slightly 
smaller confidence intervals with the correct coverage. 

To illustrate the effect of non-observability, the as- 
sumption about the available experimental information 
is slightly changed. The measurements are simulated 
for earlier and closer time steps, i.e. for i = 0, 2, . . . , 20. 
Panel (f) of Fig. [T] shows that these time points sam- 
ple only the transient increase of C(i). Hence, such a 
design does not provide sufficient information about the 
steady state level of C In other words, the modification 
limits the available information about the total amounts 



of the compounds. This, in turn, makes A{t) and B{t) 
non-observable. 

Panel (g) shows the prediction confidence intervals 
for A{t). In the chosen setting, the predictions are 
unbounded towards infinity and therefore A(t) is non- 
observable. In panel (h), it is also shown that B{t) is 
non-observable. According to the model definition, B{0) 
is known to be zero, but for i > 0, unbounded predic- 
tion confidence intervals are obtained which indicate non- 
observability of B{t). 



tration dynamics is only predicted within rather large 
intervals which for most time points nearly span a range 
between zero and 100 nM. 

The largest absolute size of the prediction confidence 
interval of 176 nM is obtained for Erk** after 181 sec- 
onds. This is the point in time where the negative feed- 
back is activated. Additional experimental investigation 
of this condition is very informative to further specify the 
dynamic behavior of the MAP kinase cascade in our ex- 
ample. Further considerations concerning experimental 
planning are provided in detail in the Appendix. 



B. MAP kinase signaling model 

Next, an ODE model of cellular signal transduction has 
been used to illustrate our method in a realistic setting. 
For this purpose, a model of the mitogen- activated protein 
(MAP) kinases which is one of the most extensively stud- 
ied sig nal tran sduction pathway, is utilized. The chosen 
model |Kholod enko2000j consists of eight dynamic states 
describing the time dependency of the MAP kinases Raf, 
RaP, Mek, Mek*, Mek**, Erk, Erk*, and Erk** which 
play a very prominent role in many cellular processes, 
e.g. in cell proliferation. A star '*' denotes phosphoryla- 
tion of the protein which biologically acts as activation. 

Panel (a) in Fig. [2] provides a summary of the MAP 
kinase signaling pathway. The enzymatic reactions in 
the ODE model are described as Michaelis-Menten rate 
equations, i.e. each reaction is parametrized with a max- 
imal enzymatic rate and a Michaelis constant. As in the 
original publication, the parameters of the two consecu- 
tive phosphorylation and dephosphorylation steps of Mek 
and Erk are assumed to be identical and the initial con- 
centrations are assumed to be known. In this setting, 
14 parameters are estimated out of three times eleven 
data points. Details about the model are provided in the 
Appendix. 

It is assumed that the total amount of the phosphory- 
lated forms for each protein, i.e. Raf*, the sum of Mek* 
and Mek** as weh as the sum of Erk* and Erk**, are 
measured. This observational assumption holds for ex- 
ample for phospho-specific antibodies such as utilized 
for Western blotting. The measurement times are set 
to 0, 100, . . . , 1000 seconds. Again, additive Gaussian 
noise is assumed. The standard deviation has been set 
to o- = 10 nM. 

In panel (b) of Fig. [5] a typical noise realization is dis- 
played. Panel (c) shows the prediction confidence inter- 
vals (dark gray) and the validation confidence intervals 
(light gray) for this noise realization calculated for all 
dynamic states. The size of the confidence intervals is 
plotted as a dashed-dotted line. 

The prediction confidence intervals show how precisely 
the dynamics is inferred by the available data. The tem- 
poral behavior of Raf, Raf* is quite well determined, 
i.e. the size of the PCI is below 40 nM. Similarly, the 
unphosphorylated states of Mek and Erk have narrow 
prediction confidence intervals. For Mek* the concen- 



III. DISCUSSION 

Existing approaches for prediction confidence intervals 
like MCMC [Marioram2003] or bootstrap procedures are 
based on forward evaluations of the model for many pa- 
rameter values. This works reasonably well for a low 
dimensional parameter space and if the target density 
function, i.e. the paramet er space to be sampled, is 
well-behaved Bavarri2004J . However, sampling nonlin- 
ear high-dimensional spaces densely is impractical and it 
is almost impossible to ensure that sampling the param- 
eter space covers all prediction scenarios. Especially in 
biological applications the target distributions frequently 
inherit strong and nonlinear functional relations. In the 
case of non-identifiability, the parameter space to be sam- 
pled is not restricted rendering convergence near to im- 
possible. 

In this paper, we applied a contrary procedure. The 
model prediction space is sampled directly and the corre- 
sponding model parameters are determined by constraint 
maximum likelihood to check the agreement of the pre- 
dictions with the data. This concept yields the predic- 
tion profile likelihood which constitutes the propagation 
of uncertainty from experimental data to predictions. 

If a comprehensive prior, i.e. for all parameters, would 
be available, a Bayesian procedure like MCMC where 
marginalization, i.e. integration over the nuisance dimen- 
sions is feasible could have superior performance. How- 
ever, in cell biology applications, prior knowledge is very 
restricted because kinetic rates and concentrations are 
highly dependent on the cell type and biological context, 
e.g. on the cellular environment and biochemical state 
of a cell. Therefore, there is usually at most some prior 
information for few parameters available. Such prior in- 
formation can be incorporated in our procedure with- 
out restricting its applicability by generalizing maximum 
likelihood estimation to maximum a-posterior estimation 
as discussed in the Appendix. 

In general, generating prediction confidence intervals 
given the uncertainty in the high-dimensional nonlin- 
ear parameter space requires large numerical efforts. 
However, this complication primarily originates from 
the complexity of the issue itself rather than from the 
methodological choice. In fact, the aim is approached 
by the prediction profile likelihood in a very efficient 




FIG. 2: Panel (a) shows the MAP kinase model according to [Kholodenko2000l ] . It is assumed that the phosphorylated 
compounds are measured. The dynamics of all compounds is intended to be predicted to illustrate the prediction profile 
likelihood approach. In panel (b) the dynamics of the MAP kinase model as well as simulated data set are plotted. The 90% 
confidence intervals of the dynamic variables for predictions (dark gray) and for validation experiments (light gray) for this 
noise realization are plotted in panel (c). The size of the PCI is plotted as a dashed-dotted line. In absolute concentrations, 
the dynamics of Erk** has the largest PCI at t=181 seconds, i.e. when the negative feedback is activated. Also, the dynamics 
of Mek* is only badly observable in our example. Measurements of both would be very informative for better calibrating the 
model. 



manner because scanning the parameter space by the 
constrained optimization procedure to explore the data- 
consistent predictions is more efficient than sampling pa- 
rameter space without considering the predictions like 
it is performed for MCMC. Instead of sampling a high- 
dimensional parameters space, only the prediction space 
has to be explored for calculating a prediction profile like- 
lihood, i.e. the optimization of the parameters reduces 
the high-dimensional sampling problem to exploring a 
single dimension. For the parameter profile likelihood, 
it has been demonstrated |Raue20Q9f that the compu- 
tational effort only scales slightly super-linear with the 
number of parameters. This result does, due to the sim- 
ilarity of the computations, carry over to the prediction 
profile likelihood. 

The prediction confidence regions introduced above 
has to be interpreted point-wise. This means that a con- 
fidence level a controls errors of type 1 which is the prob- 
ability that the model response for the true parameters 
is inside the prediction confidence interval for a single 
prediction condition if many realizations of the experi- 
mental data and the corresponding prediction confidence 
intervals are considered. 

In contrast, if a single data set is utilized to gener- 
ate many prediction intervals, e.g. predictions for sev- 
eral points in time as performed above, the results are 
statistically dependent, i.e. the realization of the PCI of 
a neighboring time point is very similar and therefore 
correlated. Therefore, the prediction confidence inter- 
vals for a compound for two adjacent points in time very 
likely both contain the true value, or neither. In such 
an example, a common prediction confidence region for 
two statistically dependent predictions would require a 
two-dimensional prediction profile likelihood. This topic, 
however, is beyond the scope of this article. 



The prediction profile likelihood also provides a con- 
cept for experimental planning. Experimental conditions 
with a very narrow prediction confidence interval are very 
accurately specified by the available data. New measure- 
ments for such a condition on the one hand does not 
provide very much additional information to better cali- 
brate the model parameters, and hence is from this point 
of view a bad choice for additional measurements. On the 
other hand, it very precisely predicts the model behav- 
ior under these certain conditions and is therefore a very 
powerful candidate setting for validating the model struc- 
ture. Contrarily, large prediction confidence intervals in- 
dicate conditions which are weakly specified by the exist- 
ing data and therefore constitute informative experimen- 
tal designs for better calibrating the model. Because a 
design optimization on the basis of the prediction profile 
likelihood does not require any linearity approximation 
like common experimental desig n techni ques, e.g. based 
on the Fisher information [Krcutz2009[ , the presented 
procedure could be very valuable for ODE models which 
are typically highly nonlinear. 

Another potential of the prediction profile likelihood 
shown in this article is its interpretation in terms of ob- 
servability. This term is very commonly used in control 
theory to characterize whether the dynamics of some un- 
observed variables can be inferred by the set of feasible 
experiments. The theory in this field is based on analyt- 
ical calculations, i.e. the limited amount and inaccuracy 
of the data is usually not considered. In this article, it 
has been shown that the prediction profile likelihood al- 
lows for a general data-based approach to check whether 
there is enough information about unobserved dynamic 
states in the given experimental design and realization of 
measurements. Therefore, in analogy to the terminology 
of practical identifiability Raue2009J, we would suggest 



to term observability for a given data set, i.e. a restricted 
prediction confidence interval, as practical observability. 

Finally, it should be noted, that a prediction could be 
any function of the compounds and the parameters. In 
applications, e.g. a ratio of two compound concentrations 
is a characteristics of interest. In principle also integrals, 
peak positions and other functions of the dynamic states 
can be considered as predictions which could be targets 
for observability considerations as well as for the calcu- 
lation of prediction and validation confidence intervals. 



IV. SUMMARY 



Generating model predictions is a major task in math- 
ematical modeling. For the dynamic mechanistic models 
as they are applied e.g. in molecular and systems biol- 
ogy, the confidence regions from parameter estimation 
can have arbitrarily complex shapes. Therefore, it is very 
difficult or even impossible to sample the parameter space 
appropriately to generate confidence intervals for predic- 
tions. This in turn impedes a data-based observability 
analysis for the dynamic states. 

In this article, the prediction profile likelihood ap- 
proach is presented as a methodology which directly cal- 
culates the set of model predictions which are consistent 
with existing measurements. This concept constitutes a 
powerful tool for assessing model predictions, perform- 
ing observability analyses, and experimental design. The 
method is feasible for arbitrary dimensions of the param- 
eter space. It only requires a proper calculation of the 
maximum likelihood value, i.e. a numerically working pa- 
rameter optimization procedure. The task of sampling a 
high-dimensional parameter space reduces to scanning a 
one-dimensional prediction space. It therefore allows the 
calculation of confidence intervals for model predictions 
as well as confidence intervals for the outcome of valida- 
tion experiments. 

The applicability of the approach has been shown by 
a small but instructive system of two consecutive reac- 
tions and a published model for MAP kinase signaling. 
For the small system, it has been shown that the predic- 
tion profile likelihood yields desired coverage properties. 
Moreover, a setting inducing non-observability has been 
investigated which is characterized by unbounded pre- 
diction confidence intervals. For the MAP kinase model, 
prediction confidence intervals and validation confidence 
intervals for all dynamic states have been determined on 
the basis of measurements of the phosphorylated pro- 
teins. In addition, the applicability of the approach for 
experimental planning has been demonstrated. 



METHODOLOGY 



A. The prediction profile likelihood 



For additive Gaussian noise e ~ N{0, a^) with known 
variance cr^, two times the negative log-likelihood 



2LL(y|0)=^ 



{v,-F{t,,u,e)Y 



const. 



(2) 



of the data y for the parameters 6 is except a constant 
offset identical to the residual sum of squares RSS(0) = 
Si {Vi ~ P{tijU,9)) ju^ . In this case, maximum likeli- 
hood estimation is equivalent to standard least-squares 
estimation 

B = argmax LL(y|6i) = argmin RSS(6') , (3) 

i.e. to minimizing the residual sum of squares. F = 
g{x{t, u, 9), 0) denotes the model response which is in our 
case given after integration of a system of differential 
equations 



i(i) = /(x(t),u(i),0) 



(4) 



with an externally controlled input function u and a map- 
ping to experimentally observable quantities 



y{t)=g{x{t),e)+e{t). 



(5) 



The parameter vector 9 comprises the kinetic parameters 
as well as the initial values, and additional offset or scal- 
ing parameters for the observ ations. 
It has been shown [Raue2009j that the profile likelihood 



PL(6'i) = maxLL(6'|2;) 



(6) 



for a parameter 9i given a data set y yields reliable con- 
fidence intervals 

GI^{9,\V) = {9, I -2PL(0,) < -2LL(y)*+zcd/(x?,a)} 

. (7) 
for the estimation of a single parameter. Here, a is the 
confidence level and icdf{xi,a) denotes the a quantile 
of the chi-square distribution with one degree of freedom 
which is given by the respective inverse cumulative den- 
sity function. LL* is the maximum of the log-likelihood 
function after all parameters are optimized. In [6], the 
optimization is performed for all parameters except 9i. 
The analogy of likelihood-based parameter and predic- 
tion confidence intervals is discussed in the Appendix. 
The desired coverage 



Prob(6'i eCI„(6l,)) =a 



(8) 



i.e. the probability that the true parameter value is in- 
side the confidence interval, holds for [7] if the magnitude 
of the decrease of the residual sum of squares by fitting 
of 9i is Xi distributed. This is given asymptotically as 



well as for linear paramete rs and is a good approx imation 
under weak assumptions |Federl968l ISeberl989| . If the 
assumptions are violated, the distribution of the mag- 
nitude of the decrease has to be generated empirically, 
i.e. by Monte-Carlo simulations, as discussed in the Ap- 
pendix. 

The experimental design D = {t^g^u} comprises all en- 
vironmental conditions which can be controlled by the 
experimenter like the measurement times i, the observ- 
ables g, and the input functions u. A prediction z = 
F{Dp^cdiS) is the response of the model F for a predic- 
tion condition £)prod = {tprcd,5prod,Uprod} specifying a 
prediction observable g-prcd evaluated at time point iprcd 
given the externally controlled stimulation Uprcd- 
In some cases the observable g-prcd corresponds to mea- 
suring a dynamic variable x{t) directly, i.e. it corresponds 
to a compound whose concentration dynamics is modeled 
by the ODEs. In a more general setting the observable 
is defined by an observational function gpj-cd{x{t) , 9) de- 
pending on several dynamic variables x. Therefore, ^prod 
does neither have to coincide with a dynamic variable 
nor with an observational function g of the measurements 
performed to build the model. 

In analogy to [8], the desired property of a prediction 
confidence interval PCla{D\y) derived from an experi- 
mental data set y with a given significance level a is that 
the probability 



Proh{F{Dp„ 



cd; "true. 



ePCl^{D\y))=a 



(9) 



that the true value of -F(I?prod,^truc) is inside the pre- 
diction confidence interval PCIq. is equal to a. In other 
words, the PCI covers the model response for the true 
parameters with a proportion a of the noise realizations 
which would yield different data sets y. 
The prediction profile likelihood 



PPL(z) = max 



LL{y\d) (10) 



In analogy to profile likelihood for parameter estimates, 
the significance level determines the threshold for the 
PPL, which is given a symptotically by the quantiles [7] 
of the Xi distribution [Meekerl995'| . In the Appendix, a 
Monte-Carlo algorithm is presented which can be used 
to calculate the threshold in cases where the asymptotic 
assumption is violated. 



B. The validation profile likelihood 

Likelihood-based confidence interval like [7] or [TT] cor- 
respond to the region where a likelihood ratio test would 
not reject the model. Having a prediction confidence in- 
terval, the question arises whether a model has to be 
rejected if a validation measurement is outside the pre- 
dicted interval. This, in fact, would hold if a "perfect" 
validation measurement would be available, i.e. a data 
point without measurement noise. For validation experi- 
ments, however, the outcome is always noisy and is there- 
fore expected to be more frequently outside the PCI than 
the true value. Hence, the prediction confidence interval 
|11| has to be generalized for application to a validation 
experiment. 

For a validation experiment, we therefore introduce a val- 
idation profile likelihood VPL and a corresponding val- 
idation confidence interval VCI^ in the following. In 
such a setting, a confidence interval should have a cover- 
age 



is obtained by maximization over the model parame- 
ters satisfying the constraint that the model response 
F{D,9*) after fitting is equals to the considered value 
z for the prediction. The prediction confidence interval 
is in analogy to [7] given by 

PCI„pp,ed|2/) = {z\ - 2PPL(z) < -2LL*(y) + icdf{xl, a)} 

(11) 
i.e. the set of predictions z = F{Dp,:ed,6) for which 
the PPL is below a threshold given by the Xi distribu- 
tion. In analogy to likelihood based confidence intervals 
for parameters, such PCI yields the smallest unbiased 
confidence intervals for predictions for given coverage a 
[Coxl994J . 

Instead of sampling a high-dimensional parameter space, 
the prediction profile likelihood calculation comprises 
sampling of a one-dimensional prediction space by evalu- 
ating several predictions z. Evaluating the maximum of 
the likelihood satisfying the prediction constraint does in 
general not require an unambiguous point in the parame- 
ter space as in the case of structural non-identifiabilities. 



Prob(z G VCIgD(i?vaii|y)) - a (12) 

for the validation data point z ^ A^(/i, SD^) with ex- 
pectation n = F{Dvaii,0truc) and variance SD . Here, 
-Dvaii denotes the design for the validation experiment. 
A validation confidence interval satisfying [12] allows a 
rejection of the model if a noisy validation measurement 
with error SD is outside the interval. 
VCI„ for validation data can be calculated by relaxing 
the constraint [TU] used to compute the prediction pro- 
file likelihood. Because in this case, the model prediction 
does not necessarily have to coincide with the data point 
z. Instead, the deviation from the validation data point 
is penalized equivalently to the data y. The agreement 
of the model with the data y and the validation measure- 
tacnt z is then given by 



LLiz,y\0) 



-F{D. 



vali; 



SD 



We now define the validation profile (log-)likelihood 



(13) 



VPL=^(z|y) = LL*{z,y) = LL(z,y|r) 



(14) 



with 6* = 9*(z,y) — argmaxg hh{z,y\6) as the maxi- 
mized joint log-likelihood in |13| read as a function of z. 
The corresponding validation confidence interval is given 
by 

^'°(i?vaii|y) = {z|-2VPLSD(z|y) < ~2LL* {z,y) + zcdf (xj, a)} 

(15) 



vci:; 



Optimization of the likelihood |13] minimizes both, the 
contribution of the data RSS(y), and the mismatch 
with the fixed prediction value z. The model response 
F{Dp,:cd,e') obtained after this parameter optimization 
can be interpreted as a prediction z' satisfying the con- 
straint optimization problem [TU] considered for the pre- 
diction profile likelihood. It holds 

LL*(z,y;SD > oylil^Ii^ilDl ^ LL*(z',2/;SD = 0), 

(16) 
i.e. the validation profile likelihood LL* can be scaled to 
the prediction likelihood via 

PPL(z'ly) = VPL^°(z|y) - l^-^^^ (17) 

where z' = F{9*{z,y, SD > 0)) is the model response for 
9* estimated from z and y. 

Optimization with nonlinear constraints is a numerically 
challenging issue. Therefore, [17] provides a helpful way 
to omit constraint optimization. The VPL can be cal- 
culated with SD > like a common least-squares min- 
imization and is then afterwards rescaled to obtain the 
PCI for the true value. 
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Appendix A: Supplementary information 
1. Re-parametrization 

Parameter estimation, i.e. the prediction of a param- 
eter value out of experimental data, can be seen as a 
special case of a model prediction. Then, the parame- 
ter profile likelihood coincides with the prediction profile 
likelihood and the respective parameter confidence inter- 
vals correspond to prediction confidence intervals. In this 
sense, the prediction profile likelihood generalizes the pa- 
rameter profile likelihood. In fact, the idea of the predic- 
tion profile likelihood and the calculation of prediction 
confidence intervals, e.g. the choice of the threshold, is 
very intuitive for this special case. 

In other situations, an analog strategy would require a 
re-parametrization of the model in a way that the desired 
model prediction is unambiguously given by the value of a 
single new parameter. Then, again the profile likelihood 
for the new parameter would give a confidence interval 
for the prediction. In this case, without loss of generality 
such a parameter can be denoted by O'^. Then, the re- 
parametrization would be a transformation 



r:{0i,...,0„j^{0;, 



J 



(Al) 



of the np parameters 6 to new parameters O'l, . . . ,9'„ 
where all predictions for the condition I?prcd satisfy 



F (-Dprcd, 0) — F (I3prod, ^l) 



(A2) 



Here, F' — FoT^^ denotes the model for the transformed 
parameters. For a transformation satisfying |A2| . any 
change of the parameters 6*2, . . . , 0^ would not affect F', 
because T is chosen in a way that the effect of 6*2, ... , Ong 
is orthogonal to the effects of 9i . 

However, because ODE systems can only be solved an- 
alytically for special cases, such a re-parametrization can- 
not be found explicitly for most realistic models. This 
restriction can be resolved numerically by an implicit 
re-parametrization which is obtained by a constrained 
nonlinear optimization procedure. This idea yields the 
prediction profile likelihood 



PPL(z) = max 

ee{e|_F(_Dp,ed,0)=z} 



LL(y|0) (A3) 



which is obtained by maximization over the model pa- 
rameters satisfying the constraint that the model re- 
sponse F{D,6*) after fitting is equals to the considered 
value z for the prediction. In this case, the 'new parame- 
ter' is the predicted value itself, i.e. z = 9[ and F' is the 
identity function. 

Equation [A3| also resolves the formal issue which oc- 
curs if there is not a unique parameter set 9 given by the 
constraint F{Dprcd, 0) = z. If there are several such pa- 
rameter sets, the ambiguities either vanish by taking the 
parameter set with maximize the log-likelihood, or they 
are not relevant because only the value of the maximized 
log-likelihood enters the calculation. 



Fit 


the model M to the measurements obtaind 


for 


the 


design D. 


This model is considered as the 'truth'. 










Define the prediction design i^™.ed' 










Evaluate true model prediction Zfj^^e — P'i^pred)' 








for 


a noise realization e^ 
yi(L>) = SIMULATEJDATA (M, D, ej) 
PLLl = CALIBRATE_MDDEL {yi(D), M) 












PLLi(ztrue) = CALIBRATEJ10DEL_C0NSTRAINT 


iViiD), 


M, 


Hrue) 




epd/i = PLLiiztrue) " PLL^ 










threshold(a) = quantileQ(ecd/) 











FIG. 3: A Monte-Carlo algorithm for calculating the profile 
likelihood threshold empirically. New noise realizations yi{D) 
are utilized to calculate the distribution of PPLi(ztrue) — 
PPL*. The a quantile of this distribution can be used as 
a threshold for prediction confidence intervals instead of the 
asymptotic threshold, i.e. instead of the a quantile of the Xi 
distribution. 
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FIG. 4: The Monte-Carlo approach allows a comparison of 
the asymptotic thresholds with the empirically calculated, 
i.e. the correct thresholds. Here, the thresholds corresponding 
to 0.05, 0.1, . . . , 0.95, 0.99 confidence levels have been plotted 
for nine dilTerent prediction scenarios. In our example, the 
asymptotic thresholds are slightly too large for predictions of 
A{t) and B{t) which makes the asymptotic confidence inter- 
vals conservative. 



2. Profile likelihood threshold 

A suitable parameter transformation makes the predic- 
tion profile likelihood equivalent to the parameter profile 
likelihood. Therefore, the following discussion holds for 
both, for parameter and for prediction confidence inter- 
vals. 

In general, fitting a model to experimental data re- 
duces the residual sum of squares RSS. In the asymp- 



10 



totic case, i.e. for a large number of data points, it can be 
shown that the decrease of RSS due to fitting one param- 
eter is chi-square distributed with one degree of freedom. 
This result also holds exactly in the non-asymptotic case 
for linear parameters. This outcome is utilized to define 
the asym ptotic thres hold for profile likelihood confidence 
intervals |Raue2009l |. 

For nonlinear parameters, the distribution of the de- 
crease of the residual sum of squares by the parame- 
ter estimation procedure, has not yet been derived for 
the general setting. However, since the profile likelihood 
based confidence intervals are independe nt on bije ctive 



transformations of the parameter space |Coxl994j . the 
assumption also holds if there is such a transformation, 
which makes the parameter of interest linear at least 
within its confidence interval. Such a transformation only 
has to exist, it is not required to derive it analytically. 

A situation where such a transformation does not ex- 
ist occurs if the nonlinearity yields a non-monotone de- 
pendency of the profile likelihood, i.e. there are several 
local minima in the confidence interval. In this case, 
there is a larger decrease of the residual sum of squares 
and the standard threshold yields conservative results, 
i.e. the calculated confidence intervals are too large for 
the desired confidence level a. 

In Fig. [3J a procedure is presented for checking the 
standard threshold. It is a Monte-Carlo analysis of the 
impact of the nonlinear constraint used to calculate the 
prediction profile likelihood on the magnitude of overfit- 
ting. In Fig. 21 the asymptotic thresholds corresponding 
to a = 0.05,0.1, .. .,0.95, 0.99, i.e. the quantiles of the 
X^ distribution, are compared with the empirically cal- 
culated thresholds for several prediction scenarios. For 
the model 



A 



B 



C 



(A4) 



the asymptotic thresholds are slightly too large for pre- 
dicting A{t) and B{t) on the basis of measurements of 
C{t). This makes the asymptotic confidence intervals 
conservative. The impact on the coverage is discussed in 
the following section. 



3. Coverage 



The coverage 



C = Prob(F(i?pred,et™c) e PGI^{D\y)) 



(A5) 



is the probability that the PClQ(z|y) contains the true 
value F{Dpicd,Stiuc)- A desired property of any con- 
fidence interval is that the coverage coincides with the 
confidence level a. 

Fig. [5] shows the estimated coverage of the predic- 
tion confidence intervals calculated for nine different pre- 
diction scenarios. In these scenarios ^(2), J5(2), C(2), 
A{10), B{10), C(10), A(50), 5(50), C(50) have been pre- 
dicted, i.e. all three dynamic variables are predicted for 



1 

0.5 


1 

0.5 


1 
0.5 



A(2) 



B(2) 



1 

0.5 


0.5 10 

A(10) 



0(2) 



> 



11^^' 



V 



l»^ 



."IP"" 
■IJ'^ 05 



lU'^ 



*>• 



\^ 



>|H^ 



^y 



1 

0.5 



0.5 
B(10) 

ii'Ll" 




1 



,1*'^ 



0.5 



ilJ 



-I' 



0.5 
C(10) 



X 



0.5 10 

A(50) 










,>' 



0.5 
B(50) 




1 

1 



Jl^ 



0.5 
0(50) 



0.5 



y 



0.5 10 0.5 1 

confidence level 



FIG. 5: Coverage of the prediction confidence intervals for the 
consecutive model. The horizontal axis is the confidence level 
a — {0.05, 0.1, ..., 0.95, 0.99} which constitutes the desired 
coverage of the confidence intervals. The vertical axis is the 
realized coverage obtained for 100 data realizations. The red 
error bars are the result obtained for the asymptotic threshold 
which yield conservative outcomes for predictions of A{t) and 
B{t). The black error bars indicate the results for the Monte- 
Carlo thresholds which shows almost perfect agreement with 
the confidence level in all prediction scenarios. 



an early, an intermediate, and a late point in time. For 
this analysis, a hundred noise realizations have been ana- 
lyzed. The error bars plotted in this figure are bootstrap 
confidence intervals of the mean coverage. 

The coverage obtained for the asymptotic threshold 
(red) tends to be conservative, i.e. the true model re- 
sponse is inside the confidence interval more frequently 
as specified by the confidence level a. This means that 
there are more false negatives than intended which does 
not constitute a serious problem in terms of validity of 
conclusions. In contrast, an anti-conservative coverage 
would constitute an issue because an increased false pos- 
itive rate could lead to invalid reasoning. 

The coverage obtained by the adjusted thresholds ob- 
tained by the Monte-Carlo algorithm shown in Fig. [3] are 
displayed by the black error bars in Fig.[5l Here, the cov- 
erage coincides with the confidence level which confirms 
the validity respective the prediction profile likelihood 
based confidence intervals. 



4. Comparison of PCI and VCI 



The validation profile likelihood satisfies 
VPLSD(z|y)<PPL(zait|2/) + ^^^^|^ Vz.it (A6) 
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Panel (A) shows the comparison for SE > SD. In this 
case, the boundaries of the VCI and the PCI differ only 
by a value around 0.38. In Panel (B), SE is chosen equal 
to SD. In Panels (C) and (D) SE is further decreased. 
This corresponds to more informative data for predict- 
ing the exact value of z. In these cases, the optimum of 
the PPL is narrow in comparison to validation data error 
SD. Then, during fitting the model, a mismatch z — z* 
is predominantly explained by the observation error of 
the validation data point. The difference of the bound- 
aries of the confidence intervals increase and approach 
the 10% quantile of the Gaussian distribution, i.e. a value 
icdf(iV(0, SD = 1), .95) = 1.64 which is the one-sided 5% 
confidence interval for a validation data point for a con- 
stant model prediction, i.e. for SE —5- 0. 



FIG. 6: Comparison of prediction and validation confidence 
intervals. Panel (A) shows a prediction profile likelihood (red 
line) with a rather flat shape. Here, the curvature of the pre- 
diction profile likelihood corresponds to a prediction standard 
error SE = 2. In this case, the prediction confidence inter- 
vals are large (red shaded) and the increase of the vafidation 
confidence intervals (gray) is smaller than indicated by the 
validation data error SD. If the data is more informative, 
i.e. SE decreases (panels B-D), the slope of prediction profile 
likelihood increases yielding larger difference between the PCI 
and VCI. 



i.e. the VPL {z\y) is smaller than the right hand side of 
the inequality for any alternative predicted value Zait . On 
the one hand, the inequality can be utilized to interpret 
a difference between the respective confidence intervals. 
Furthermore, the equation can be utilized for consistency 
checks, e.g. to prove the numerically calculated VPL and 
PPL. Small differences between the size of the VCI 
and PCI indicate a flat prediction proflle likelihood close 
to the threshold whereas deviations of the confidence in- 
tervals in the order of magnitude of SD occur if the PPL 
has a large slope. This aspect is illustrated in the follow- 
ing. 

For illustration purpose, a quadratic prediction profile 
likelihood 



2PPL(z) = — ^ 
SE^ 



(A7) 



with SE e {0.1, 0.5, 1, 2} has been assumed. These four 
settings are shown in Fig. [51 The prediction profile likeli- 
hood is shown as a red line. For several Zait, the quadratic 
term in |A6| is plotted by blue curves attached to the 
PPL. The VPL constitutes the infimum of these curves 
which in this special case can be calculated analytically 
and is given by 



-2VPL''^(z) 




zSE^ 



SD (SD^ + SE^) 
(A8) 



5. Prior information 

If prior information about parameters is available, 
e.g. a prior distribution tt{9), maximum likelihood esti- 
mation is replaced by maximum a-posteriori (MAP) es- 
timation 

^MAP == argmaxp(2/|6')7r(6') (A9) 

6 

= argmax(LL(y|6i)-|-log(7r(6i))) , (AlO) 

8 

i.e. the parameters are estimated by maximizing the a- 
posterior probability of the data and the parameter esti- 
mates. For most common priors, MAP estimation can be 
performed by MLE using a penalized likelihood. As an 
example, a log-normal prior for a parameter component 
6' yields 



LL, 



1 (log(gO - (gp)^ 
^^ 2 Var(e') +^""^^ 



(All) 



To incorporate prior knowledge, the presented prediction 
profile likelihood approach has to be generalized to MAP 
estimation and the penalized likelihood |A11| is used in- 
stead of the standard log-likelihood LL. 

To illustrate the incorporation of prior knowledge for 
parameter values, the initial concentration A{0) — 63 is 
assumed to be drawn from a log-normal distribution 



93 ^ logN{0, 1) 



(A12) 



with expectation (log (6*3)) — and variance 
yar(log(6'3)) — 0.1. For parameter estimation, this 
is accounted for by using the penalized likelihood |A11| , 
i.e. by adding an additional term to the residual sum of 
squares. 

As in the example in the main text, the calculation 

of the prediction and validation confidence intervals has 

2been repeated for t — 0,10, ... , 100 and all three dynamic 

I states A{t), B{t), Cit). In this example, the true value 

of A{0) = 03 has been drawn according to the prior from 

the log- normal distribution |A12| . 
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Fig. [7] shows the prediction profile likelihood functions 
as curves in vertical direction as well as the respective 
90% prediction confidence intervals as dark gray shaded 
regions. The prediction confidence intervals plotted in 
light color are obtained if ^3 is estimated without prior 
information. Because C is the measured compound in 
our example, the prediction confidence intervals for C are 
much smaller than for A and B. However, also A and B 
yield bounded prediction confidence intervals which can 
be interpreted as observability of these dynamic states. 
Omitting the prior information yields larger prediction 
confidence intervals, especially for the unobserved states 
A{t) and B{t). 



the calculation of prediction and validation confidence 
intervals. 

Fig. [TU] shows the long-term dynamics of this model, 
i.e. the oscillations. In our analysis only the initial phase, 
i.e. the first 1000 seconds have been considered. This 
time interval is characterized by strong nonlinearity of 
the model response with respect to the parameters and 
constitutes a compromise setting between a transient and 
an oscillatory dynamics. 

Tab. 1 summari zes the model par ameters as they have 
been published in JKholodenko2000J . The Hill coefficient 
n is assumed to be equal to one. For the observational 
noise of the validation data, the same noise level as for the 
experimental data has been assumed, i.e. a = SD = 10. 



6. Validation profile likelihood for the consecutive 
reaction model 



9. Experimental design conclusions 



In the main text, prediction confidence intervals have 
been shown for the consecutive reaction model. Fig. [8] 
shows the corresponding validation profile and the re- 
spective validation confidence intervals for the same noise 
realization for all dynamic variables A{t), B{t), and C{t). 
Validation confidence intervals account for the measure- 
ment noise in a validation experiment. Therefore, they 
are larger than the prediction confidence intervals shown 
in the main text in Fig. 1, panels (c)-(e). 

Because Gaussian noise e ~ N{fi, SD'^) has been as- 
sumed, the validation confidence intervals covers negative 
values if the true model response ^ = F(I?prod: ^truo) is 
close to zero. 



7. Observability of the long-term dynamics 

In the main text, it has been discussed how to exploit 
the prediction profile likelihood for observability analy- 
ses. For the two step model 



A% B % C 



(A13) 



it has been shown that measurements of the compound 
C which sample only the transient increase and there- 
fore does not provide information about the steady state 
level lead to non-observability of A(i), and B{t) for i > 0. 
In addition to that result. Fig. |9] shows prediction confi- 
dence intervals for C{t) for times much larger than the 
measurement times i = 0, 2, . . . , 20. C{t) becomes prac- 
tically non-observable for times which are much larger 
than the time sampling interval. 



8. Characteristics of the MAP kinase model 

To demonstrate the applicability of our approach in 
a realisti c setting, the pub lished model of MAP kinase 
signaling (Kholodenko2000f has been utilized to illustrate 



The size of prediction confidence interval can be uti- 
lized to figure out informative experimental designs. If 
the information of a single data points is intended to be 
evaluated, then the validation confidence intervals are ap- 
propriate. If many experimental replicates are feasible, 
the average observation will have a small standard error 
and then prediction confidence intervals can be used to 
assess a design. 

Fig. [TT] shows the size of the 90% prediction confidence 
intervals (upper row), i.e. the difference between the up- 
per and lower boundary, and the size of the validation 
confidence intervals (lower row) along the time axis. The 
size is plotted in absolute concentrations (left panels) and 
relative to the total amount of the protein (right panels) . 

Independently from the way of the assessment, Erk 
(blue lines) yields the smallest prediction and validation 
confidence intervals for 300 < t < 1000. Therefore, mea- 
surements of Erk in this time interval constitute very 
informative experimental designs for testing the model. 
As discussed in the main text, such a setting which is 
appropriate for validating the whole model is not infor- 
mative for improving the model parameters. For such a 
purpose, new experimental data has to be generated for 
designs in which the model behavior is not yet precisely 
specified, i.e. for a setting with large prediction or vali- 
dation confidence intervals. In these terms, Erk** (gray 
lines) is most informative in absolute units between 100 
and 200 seconds. Also absolute measurements of Mek* 
(red lines) and Mek** (green lines) along the whole time 
axis are informative. 

If only the amount of a phosphorylated form relative to 
the total concentration of the protein is experimentally 
accessible, then the panels on the right should be eval- 
uated to assess the power of a design. In our example, 
the outcome for the prediction profile likelihood is very 
similar to the results obtained for absolute concentra- 
tions. Again, Erk for t < 200 as well as Mek* and Mek** 
are most informative. For the validation confidence in- 
tervals, Raf and Raf* appear as informative because the 
total concentration of Raf is three times smaller than the 
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symbol description 


value lower boundary 


upper boundary 


units 


Vi 


max. enzyme rate 


2.5 


le-8 


le6 


nMs~" 


n 


Hill coefficient of the feedback 


1 


1 


1 


1 


Ki 


Michaelis constant 


9 


le-8 


le6 


nM 


Ki 


Michaelis constant 


10 


le-8 


le6 


nM 


V2 


max. enzyme rate 


0.25 


le-8 


le6 


nM s"^ 


K2 


Michaelis constant 


8 


le-8 


le6 


nM 


k3 


catalytic rate constant 


0.025 


le-8 


le6 


nM s~^ 


Ki 


Michaelis constant 


15 


le-8 


le6 


nM 


ki 


catalytic rate constant 


0.025 


le-8 


le6 


nM s"^ 


Ki 


Michaelis constant 


15 


le-8 


le6 


nM 


V, 


max. enzyme rate 


0.75 


le-8 


le6 


nM s"^ 


Ks 


Michaelis constant 


15 


le-8 


le6 


nM 


Ve 


max. enzyme rate 


0.75 


le-8 


le6 


nM s~-^ 


Ke 


Michaelis constant 


15 


le-8 


le6 


nM 


kr 


catalytic rate constant 


0.025 


le-8 


le6 


nM s"-^ 


K7 


Michaelis constant 


15 


le-8 


le6 


nM 


kg 


catalytic rate constant 


0.025 


le-8 


le6 


nM s"^ 


Ks 


Michaelis constant 


15 


le-8 


le6 


nM 


V9 


max. enzyme rate 


0.5 


le-8 


le6 


nM s~"^ 


Kg 


Michaelis constant 


15 


le-8 


le6 


nM 


1^0 


max. enzyme rate 


0.5 


le-8 


le6 


nM s"-^ 


Kio 


max. enzyme rate 


15 


le-8 


le6 


nM 



TABLE I: Parameters of the MAP kinase model as published in [Kholodenko2000l | . 



total amount of Mek and Erk. 



10. Implementation 

The cvodes package from the Suite of Nonhnear 
Diffcrential/Alscbraic Equation Solvers (SUNDIALS) 
[Hindmarsli2005j has been used for the numerical integra- 
tion of the ODEs and the sensitivity equations. MAT- 
LAB 's fmincon optimizer was used to estimate the pa- 
rameters. The gradient and the Hessian of the objective 
function have been p rovided fo r the optimizer using the 
sensitivity equations Leisl988l | and the approximation 
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de^dOk 



LL; 



2-^ n-2 Pin, fin, 



af dOj dOu 



(A14) 



of the second derivatives |Pressl993| . Within a single 
optimization procedure, the parameters have been alter- 



natingly optimized on the logarithmic scale as well as the 
common linear scale until the optimizer converged to a 
common value on both scales. 

For the calculation of the validation profile likelihood, 
30 test predictions z within the reasonable range of pre- 
dictions given by the model structure and SD have been 
evaluated to obtain an initial guess of the likelihood 
shape. Then the grid is iteratively refined and/or en- 
larged until a smooth validation likelihood covering the 
whole confidence interval is obtained and local minima 
have been removed. For a single profile likelihood, around 
10^ to 10"^ optimizations were required. 

For the prediction profile likelihood the initial guess is 
obtained from the VPL by equation [17] in the main text. 
The gaps in this guess are then filled by nonlinear con- 
strained optimization. If the constraint optimization pro- 
cedure did not converge, the validation data error SD has 
been iteratively decreased by factors 10°, 10~^, . . . , 10^*^. 
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FIG. 7: The curves in vertical direction are the prediction profile likelihood functions for A{t) (left panel), B{t) (middle), and 
C{t) (right panel) if a log-normal prior for ^3 is assumed. The respective 90% confidence intervals are plotted in dark gray. 
The light gray regions indicate the 90% confidence intervals if the parameter Og is estimated without prior information. 






FIG. 8: The left panel shows the validation confidence intervals for the unobserved state A{t). The validation profile likelihood 
functions are plotted as curves in vertical direction. For the plotting the confidence intervals along the time axis, the VCIs have 
been interconnected by cubic piecewise interpolation. Validation confidence intervals and validation profile likelihood functions 
for the intermediate unobserved state B{t) are shown in the middle and for C{t) in the right panel. 




FIG. 9: Prediction confidence intervals for the extrapolation 
of C{t) to time points much larger than the measurement 
times. Because in this example the experimental design does 
not provide sufficient information about the steady state level 
of C{t), the prediction confidence intervals diverge and the 
steady state C{t) for i — 5- cx) is non-observable. 
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FIG. 10: The long-term dynamics of the MAP kinase model 
shows regular oscillations. In our analysis, the first 1000 sec- 
onds, as highlighted by gray background color, have been an- 
alyzed as a compromise between a transient and oscillatory 
dynamics. 
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FIG. 11: Size of the prediction confidence intervals for the 
dynamic states of the MAP kinase model. The left panels 
show the size of the confidence intervals in absolute units. In 
the panels on the right, the size is plotted relative to the total 
concentration of a protein. The upper row shows the results 
for the prediction confidence intervals, the lower row for the 
validation confidence intervals. 



